• Random Forest Weighted Local Fr{{\'e}}chet Regression with Random Objects

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Random Forest Weighted Local Fréchet Regression with Random Objects Rui Qiu , Zhou Yu , Ruoqing Zhu 25(107 1 69, 2024. Abstract Statistical analysis is increasingly confronted with complex data from metric spaces . Petersen and Müller 2019 established a general paradigm of Fréchet regression with complex metric space valued responses and Euclidean predictors . However , the local approach therein involves nonparametric kernel smoothing and suffers from the curse of dimensionality . To address this issue , we in this paper propose a novel random forest weighted local Fréchet regression paradigm . The

  • Unsupervised Anomaly Detection Algorithms on Real-world Data: How Many Do We Need?

    Updated: 2024-07-31 16:08:59
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Unsupervised Anomaly Detection Algorithms on Real-world Data : How Many Do We Need Roel Bouman , Zaharah Bukhsh , Tom Heskes 25(105 1 34, 2024. Abstract In this study we evaluate 33 unsupervised anomaly detection algorithms on 52 real-world multivariate tabular data sets , performing the largest comparison of unsupervised anomaly detection algorithms to date . On this collection of data sets , the EIF Extended Isolation Forest algorithm significantly outperforms the most other algorithms . Visualizing and then clustering the relative performance of the considered algorithms on all data sets , we

  • Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Multi-class Probabilistic Bounds for Majority Vote Classifiers with Partially Labeled Data Vasilii Feofanov , Emilie Devijver , Massih-Reza Amini 25(104 1 47, 2024. Abstract In this paper , we propose a probabilistic framework for analyzing a multi-class majority vote classifier in the case where training data is partially labeled . First , we derive a multi-class transductive bound over the risk of the majority vote classifier , which is based on the classifier's vote distribution over each class . Then , we introduce a mislabeling error model to analyze the error of the majority vote classifier in

  • Information Processing Equalities and the Information–Risk Bridge

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Information Processing Equalities and the Information–Risk Bridge Robert C . Williamson , Zac Cranko 25(103 1 53, 2024. Abstract We introduce two new classes of measures of information for statistical experiments which generalise and subsume φ-divergences , integral probability metrics , N-distances MMD and f,Γ divergences between two or more distributions . This enables us to derive a simple geometrical relationship between measures of information and the Bayes risk of a statistical decision problem , thus extending the variational φ-divergence representation to multiple distributions in an entirely

  • Nonparametric Regression for 3D Point Cloud Learning

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Nonparametric Regression for 3D Point Cloud Learning Xinyi Li , Shan Yu , Yueying Wang , Guannan Wang , Li Wang , Ming-Jun Lai 25(102 1 56, 2024. Abstract In recent years , there has been an exponentially increased amount of point clouds collected with irregular shapes in various areas . Motivated by the importance of solid modeling for point clouds , we develop a novel and efficient smoothing tool based on multivariate splines over the triangulation to extract the underlying signal and build up a 3D solid model from the point cloud . The proposed method can denoise or deblur the point cloud

  • AMLB: an AutoML Benchmark

    Updated: 2024-07-31 16:08:59
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us AMLB : an AutoML Benchmark Pieter Gijsbers , Marcos L . P . Bueno , Stefan Coors , Erin LeDell , Sébastien Poirier , Janek Thomas , Bernd Bischl , Joaquin Vanschoren 25(101 1 65, 2024. Abstract Comparing different AutoML frameworks is notoriously challenging and often done incorrectly . We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks . We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks . The differences between the AutoML frameworks are explored with

  • Semi-supervised Inference for Block-wise Missing Data without Imputation

    Updated: 2024-07-31 16:08:59
    We consider statistical inference for single or low-dimensional parameters in a high-dimensional linear model under a semi-supervised setting, wherein the data are a combination of a labelled block-wise missing data set of a relatively small size and a large unlabelled data set. The proposed method utilises both labelled and unlabelled data without any imputation or removal of the missing observations. The asymptotic properties of the estimator are established under regularity conditions. Hypothesis testing for low-dimensional coefficients are also studied. Extensive simulations are conducted to examine the theoretical results. The method is evaluated on the Alzheimer’s Disease Neuroimaging Initiative data.

  • Optimal Locally Private Nonparametric Classification with Public Data

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Optimal Locally Private Nonparametric Classification with Public Data Yuheng Ma , Hanfang Yang 25(167 1 62, 2024. Abstract In this work , we investigate the problem of public data assisted non-interactive Local Differentially Private LDP learning with a focus on non-parametric classification . Under the posterior drift assumption , we for the first time derive the mini-max optimal convergence rate with LDP constraint . Then , we present a novel approach , the locally differentially private classification tree , which attains the mini-max optimal convergence rate . Furthermore , we design a

  • Bayesian Regression Markets

    Updated: 2024-07-31 16:08:59
    Although machine learning tasks are highly sensitive to the quality of input data, relevant datasets can often be challenging for firms to acquire, especially when held privately by a variety of owners. For instance, if these owners are competitors in a downstream market, they may be reluctant to share information. Focusing on supervised learning for regression tasks, we develop a regression market to provide a monetary incentive for data sharing. Our mechanism adopts a Bayesian framework, allowing us to consider a more general class of regression tasks. We present a thorough exploration of the market properties, and show that similar proposals in literature expose the market agents to sizeable financial risks, which can be mitigated in our setup.

  • Neural Feature Learning in Function Space

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Neural Feature Learning in Function Space Xiangxiang Xu , Lizhong Zheng 25(142 1 76, 2024. Abstract We present a novel framework for learning system design with neural feature extractors . First , we introduce the feature geometry , which unifies statistical dependence and feature representations in a function space equipped with inner products . This connection defines function-space concepts on statistical dependence , such as norms , orthogonal projection , and spectral decomposition , exhibiting clear operational meanings . In particular , we associate each learning setting with a dependence

  • Topological Node2vec: Enhanced Graph Embedding via Persistent Homology

    Updated: 2024-07-31 16:08:59
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Topological Node2vec : Enhanced Graph Embedding via Persistent Homology Yasuaki Hiraoka , Yusuke Imoto , Théo Lacombe , Killian Meehan , Toshiaki Yachimura 25(134 1 26, 2024. Abstract Node2vec is a graph embedding method that learns a vector representation for each node of a weighted graph while seeking to preserve relative proximity and global structure . Numerical experiments suggest Node2vec struggles to recreate the topology of the input graph . To resolve this we introduce a topological loss term to be added to the training loss of Node2vec which tries to align the persistence diagram PD of

  • Learning to Warm-Start Fixed-Point Optimization Algorithms

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Learning to Warm-Start Fixed-Point Optimization Algorithms Rajiv Sambharya , Georgina Hall , Brandon Amos , Bartolomeo Stellato 25(166 1 46, 2024. Abstract We introduce a machine-learning framework to warm-start fixed-point optimization algorithms . Our architecture consists of a neural network mapping problem parameters to warm starts , followed by a predefined number of fixed-point iterations . We propose two loss functions designed to either minimize the fixed-point residual or the distance to a ground truth solution . In this way , the neural network predicts warm starts with the end-to-end goal

  • Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Granger Causal Inference in Multivariate Hawkes Processes by Minimum Message Length Katerina Hlaváčková-Schindler , Anna Melnykova , Irene Tubikanec 25(133 1 26, 2024. Abstract Multivariate Hawkes processes MHPs are versatile probabilistic tools used to model various real-life phenomena : earthquakes , operations on stock markets , neuronal activity , virus propagation and many others . In this paper , we focus on MHPs with exponential decay kernels and estimate connectivity graphs , which represent the Granger causal relations between their components . We approach this inference problem by

  • Multi-Objective Neural Architecture Search by Learning Search Space Partitions

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Multi-Objective Neural Architecture Search by Learning Search Space Partitions Yiyang Zhao , Linnan Wang , Tian Guo 25(177 1 41, 2024. Abstract Deploying deep learning models requires taking into consideration neural network metrics such as model size , inference latency , and FLOPs , aside from inference accuracy . This results in deep learning model designers leveraging multi-objective optimization to design effective deep neural networks in multiple criteria . However , applying multi-objective optimizations to neural architecture search NAS is nontrivial because NAS tasks usually have a huge

  • Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Variational Estimators of the Degree-corrected Latent Block Model for Bipartite Networks Yunpeng Zhao , Ning Hao , Ji Zhu 25(150 1 42, 2024. Abstract Bipartite graphs are ubiquitous across various scientific and engineering fields . Simultaneously grouping the two types of nodes in a bipartite graph via biclustering represents a fundamental challenge in network analysis for such graphs . The latent block model LBM is a commonly used model-based tool for biclustering . However , the effectiveness of the LBM is often limited by the influence of row and column sums in the data matrix . To address this

  • PyGOD: A Python Library for Graph Outlier Detection

    Updated: 2024-07-31 16:08:59
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us PyGOD : A Python Library for Graph Outlier Detection Kay Liu , Yingtong Dou , Xueying Ding , Xiyang Hu , Ruitong Zhang , Hao Peng , Lichao Sun , Philip S . Yu 25(141 1 9, 2024. Abstract PyGOD is an open-source Python library for detecting outliers in graph data . As the first comprehensive library of its kind , PyGOD supports a wide array of leading graph-based methods for outlier detection under an easy-to-use , well-documented API designed for use by both researchers and practitioners . PyGOD provides modularized components of the different detectors implemented so that users can easily customize

  • Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Classification of Data Generated by Gaussian Mixture Models Using Deep ReLU Networks Tian-Yi Zhou , Xiaoming Huo 25(190 1 54, 2024. Abstract This paper studies the binary classification of unbounded data from mathbb R d$ generated under Gaussian Mixture Models GMMs using deep ReLU neural networks . We obtain for the first time non-asymptotic upper bounds and convergence rates of the excess risk excess misclassification error for the classification without restrictions on model parameters . While the majority of existing generalization analysis of classification algorithms relies on a bounded domain ,

  • Fermat Distances: Metric Approximation, Spectral Convergence, and Clustering Algorithms

    Updated: 2024-07-31 16:08:59
    : , , Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Fermat Distances : Metric Approximation , Spectral Convergence , and Clustering Algorithms Nicolás García Trillos , Anna Little , Daniel McKenzie , James M . Murphy 25(176 1 65, 2024. Abstract We analyze the convergence properties of Fermat distances , a family of density-driven metrics defined on Riemannian manifolds with an associated probability measure . Fermat distances may be defined either on discrete samples from the underlying measure , in which case they are random , or in the continuum setting , where they are induced by geodesics under a density-distorted Riemannian metric . We

  • Nonparametric Regression Using Over-parameterized Shallow ReLU Neural Networks

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Nonparametric Regression Using Over-parameterized Shallow ReLU Neural Networks Yunfei Yang , Ding-Xuan Zhou 25(165 1 35, 2024. Abstract It is shown that over-parameterized neural networks can achieve minimax optimal rates of convergence up to logarithmic factors for learning functions from certain smooth function classes , if the weights are suitably constrained or regularized . Specifically , we consider the nonparametric regression of estimating an unknown d$-variate function by using shallow ReLU neural networks . It is assumed that the regression function is from the H older space with smoothness

  • Linear Distance Metric Learning with Noisy Labels

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Linear Distance Metric Learning with Noisy Labels Meysam Alishahi , Anna Little , Jeff M . Phillips 25(121 1 53, 2024. Abstract In linear distance metric learning , we are given data in one Euclidean metric space and the goal is to find an appropriate linear map to another Euclidean metric space which respects certain distance conditions as much as possible . In this paper , we formalize a simple and elegant method which reduces to a general continuous convex loss optimization problem , and for different noise models we derive the corresponding loss functions . We show that even if the data is noisy

  • Representation Learning via Manifold Flattening and Reconstruction

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Representation Learning via Manifold Flattening and Reconstruction Michael Psenka , Druv Pai , Vishal Raman , Shankar Sastry , Yi Ma 25(132 1 47, 2024. Abstract A common assumption for real-world , learnable data is its possession of some low-dimensional structure , and one way to formalize this structure is through the manifold hypothesis : that learnable data lies near some low-dimensional manifold . Deep learning architectures often have a compressive autoencoder component , where data is mapped to a lower-dimensional latent space , but often many architecture design choices are done by hand ,

  • Differentially Private Topological Data Analysis

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Differentially Private Topological Data Analysis Taegyu Kang , Sehwan Kim , Jinwon Sohn , Jordan Awan 25(189 1 42, 2024. Abstract This paper is the first to attempt differentially private DP topological data analysis TDA producing near-optimal private persistence diagrams . We analyze the sensitivity of persistence diagrams in terms of the bottleneck distance , and we show that the commonly used Cech complex has sensitivity that does not decrease as the sample size n$ increases . This makes it challenging for the persistence diagrams of Cech complexes to be privatized . As an alternative , we show

  • Spherical Rotation Dimension Reduction with Geometric Loss Functions

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Spherical Rotation Dimension Reduction with Geometric Loss Functions Hengrui Luo , Jeremy E . Purvis , Didong Li 25(175 1 55, 2024. Abstract Modern datasets often exhibit high dimensionality , yet the data reside in low-dimensional manifolds that can reveal underlying geometric structures critical for data analysis . A prime example of such a dataset is a collection of cell cycle measurements , where the inherently cyclical nature of the process can be represented as a circle or sphere . Motivated by the need to analyze these types of datasets , we propose a nonlinear dimension reduction method ,

  • OpenBox: A Python Toolkit for Generalized Black-box Optimization

    Updated: 2024-07-31 16:08:59
    Black-box optimization (BBO) has a broad range of applications, including automatic machine learning, experimental design, and database knob tuning. However, users still face challenges when applying BBO methods to their problems at hand with existing software packages in terms of applicability, performance, and efficiency. This paper presents OpenBox, an open-source BBO toolkit with improved usability. It implements user-friendly interfaces and visualization for users to define and manage their tasks. The modular design behind OpenBox facilitates its flexible deployment in existing systems. Experimental results demonstrate the effectiveness and efficiency of OpenBox over existing systems. The source code of OpenBox is available at https://github.com/PKU-DAIR/open-box.

  • Bagging Provides Assumption-free Stability

    Updated: 2024-07-31 16:08:59
    Bagging is an important technique for stabilizing machine learning models. In this paper, we derive a finite-sample guarantee on the stability of bagging for any model. Our result places no assumptions on the distribution of the data, on the properties of the base algorithm, or on the dimensionality of the covariates. Our guarantee applies to many variants of bagging and is optimal up to a constant. Empirical results validate our findings, showing that bagging successfully stabilizes even highly unstable base algorithms.

  • Nonparametric Copula Models for Multivariate, Mixed, and Missing Data

    Updated: 2024-07-31 16:08:59
    , , Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Nonparametric Copula Models for Multivariate , Mixed , and Missing Data Joseph Feldman , Daniel R . Kowal 25(164 1 50, 2024. Abstract Modern data sets commonly feature both substantial missingness and many variables of mixed data types , which present significant challenges for estimation and inference . Complete case analysis , which proceeds using only the observations with fully-observed variables , is often severely biased , while model-based imputation of missing values is limited by the ability of the model to capture complex dependencies among possibly many variables of mixed data types .

  • Generative Adversarial Ranking Nets

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Generative Adversarial Ranking Nets Yinghua Yao , Yuangang Pan , Jing Li , Ivor W . Tsang , Xin Yao 25(119 1 35, 2024. Abstract We propose a new adversarial training framework generative adversarial ranking networks GARNet to learn from user preferences among a list of samples so as to generate data meeting user-specific criteria . Verbosely , GARNet consists of two modules : a ranker and a generator . The generator fools the ranker to raise generated samples to the top while the ranker learns to rank generated samples at the bottom . Meanwhile , the ranker learns to rank samples regarding the

  • Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Adjusted Wasserstein Distributionally Robust Estimator in Statistical Learning Yiling Xie , Xiaoming Huo 25(148 1 40, 2024. Abstract We propose an adjusted Wasserstein distributionally robust estimator---based on a nonlinear transformation of the Wasserstein distributionally robust WDRO estimator in statistical learning . The classic WDRO estimator is asymptotically biased , while our adjusted WDRO estimator is asymptotically unbiased , resulting in a smaller asymptotic mean squared error . Further , under certain conditions , our proposed adjustment technique provides a general principle to de-bias

  • Predictive Inference with Weak Supervision

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Predictive Inference with Weak Supervision Maxime Cauchois , Suyash Gupta , Alnur Ali , John C . Duchi 25(118 1 45, 2024. Abstract The expense of acquiring labels in large-scale statistical machine learning makes partially and weakly-labeled data attractive , though it is not always apparent how to leverage such data for model fitting or validation . We present a methodology to bridge the gap between partial supervision and validation , developing a conformal prediction framework to provide valid predictive confidence sets---sets that cover a true label with a prescribed probability , independent of

  • Fixed points of nonnegative neural networks

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Fixed points of nonnegative neural networks Tomasz J . Piotrowski , Renato L . G . Cavalcante , Mateusz Gabor 25(139 1 40, 2024. Abstract We use fixed point theory to analyze nonnegative neural networks , which we define as neural networks that map nonnegative vectors to nonnegative vectors . We first show that nonnegative neural networks with nonnegative weights and biases can be recognized as monotonic and weakly scalable mappings within the framework of nonlinear Perron-Frobenius theory . This fact enables us to provide conditions for the existence of fixed points of nonnegative neural networks

  • An Analysis of Quantile Temporal-Difference Learning

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us An Analysis of Quantile Temporal-Difference Learning Mark Rowland , Rémi Munos , Mohammad Gheshlaghi Azar , Yunhao Tang , Georg Ostrovski , Anna Harutyunyan , Karl Tuyls , Marc G . Bellemare , Will Dabney 25(163 1 47, 2024. Abstract We analyse quantile temporal-difference learning QTD a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning . Despite these empirical successes , a theoretical understanding of QTD has proven elusive until now . Unlike classical TD learning , which can be analysed

  • An Entropy-Based Model for Hierarchical Learning

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us An Entropy-Based Model for Hierarchical Learning Amir R . Asadi 25(187 1 45, 2024. Abstract Machine learning , the predominant approach in the field of artificial intelligence , enables computers to learn from data and experience . In the supervised learning framework , accurate and efficient learning of dependencies between data instances and their corresponding labels requires auxiliary information about the data distribution and the target function . This central concept aligns with the notion of regularization in statistical learning theory . Real-world datasets are often characterized by

  • Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Statistical Optimality of Divide and Conquer Kernel-based Functional Linear Regression Jiading Liu , Lei Shi 25(155 1 56, 2024. Abstract Previous analysis of regularized functional linear regression in a reproducing kernel Hilbert space RKHS typically requires the target function to be contained in this kernel space . This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not necessarily reside in the underlying RKHS . As a decomposition-based scalable approach , the divide-and-conquer estimators of functional linear regression

  • Differentially Private Data Release for Mixed-type Data via Latent Factor Models

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Differentially Private Data Release for Mixed-type Data via Latent Factor Models Yanqing Zhang , Qi Xu , Niansheng Tang , Annie Qu 25(116 1 37, 2024. Abstract Differential privacy is a particular data privacy-preserving technology which enables synthetic data or statistical analysis results to be released with a minimum disclosure of private information from individual records . The tradeoff between privacy-preserving and utility guarantee is always a challenge for differential privacy technology , especially for synthetic data generation . In this paper , we propose a differentially private data

  • DoWhy-GCM: An Extension of DoWhy for Causal Inference in Graphical Causal Models

    Updated: 2024-07-31 16:08:59
    We present DoWhy-GCM, an extension of the DoWhy Python library, which leverages graphical causal models. Unlike existing causality libraries, which mainly focus on effect estimation, DoWhy-GCM addresses diverse causal queries, such as identifying the root causes of outliers and distributional changes, attributing causal influences to the data generating process of each node, or diagnosis of causal structures. With DoWhy-GCM, users typically specify cause-effect relations via a causal graph, fit causal mechanisms, and pose causal queries---all with just a few lines of code. The general documentation is available at https://www.pywhy.org/dowhy and the DoWhy-GCM specific code at https://github.com/py-why/dowhy/tree/main/dowhy/gcm.

  • Conformal Inference for Online Prediction with Arbitrary Distribution Shifts

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Conformal Inference for Online Prediction with Arbitrary Distribution Shifts Isaac Gibbs , Emmanuel J . Candès 25(162 1 36, 2024. Abstract We consider the problem of forming prediction sets in an online setting where the distribution generating the data is allowed to vary over time . Previous approaches to this problem suffer from over-weighting historical data and thus may fail to quickly react to the underlying dynamics . Here , we correct this issue and develop a novel procedure with provably small regret over all local time intervals of a given width . We achieve this by modifying the adaptive

  • The Non-Overlapping Statistical Approximation to Overlapping Group Lasso

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us The Non-Overlapping Statistical Approximation to Overlapping Group Lasso Mingyu Qi , Tianxi Li 25(115 1 70, 2024. Abstract The group lasso penalty is widely used to introduce structured sparsity in statistical learning , characterized by its ability to eliminate predefined groups of parameters automatically . However , when the groups overlap , solving the group lasso problem can be time-consuming in high-dimensional settings due to groups’ non-separability . This computational challenge has limited the applicability of the overlapping group lasso penalty in cutting-edge areas , such as gene pathway

  • A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression Youngseok Kim , Wei Wang , Peter Carbonetto , Matthew Stephens 25(185 1 59, 2024. Abstract We introduce a new empirical Bayes approach for large-scale multiple linear regression . Our approach combines two key ideas : i the use of flexible adaptive shrinkage priors , which approximate the nonparametric family of scale mixture of normal distributions by a finite mixture of normal distributions and ii the use of variational approximations to efficiently estimate prior hyperparameters and compute

  • Flexible Bayesian Product Mixture Models for Vector Autoregressions

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Flexible Bayesian Product Mixture Models for Vector Autoregressions Suprateek Kundu , Joshua Lukemire 25(146 1 52, 2024. Abstract Bayesian non-parametric methods based on Dirichlet process mixtures have seen tremendous success in various domains and are appealing in being able to borrow information by clustering samples that share identical parameters . However , such methods can face hurdles in heterogeneous settings where objects are expected to cluster only along a subset of axes or where clusters of samples share only a subset of identical parameters . We overcome such limitations by developing a

  • More Efficient Estimation of Multivariate Additive Models Based on Tensor Decomposition and Penalization

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us More Efficient Estimation of Multivariate Additive Models Based on Tensor Decomposition and Penalization Xu Liu , Heng Lian , Jian Huang 25(161 1 27, 2024. Abstract We consider parsimonious modeling of high-dimensional multivariate additive models using regression splines , with or without sparsity assumptions . The approach is based on treating the coefficients in the spline expansions as a third-order tensor . Note the data does not have tensor predictors or tensor responses , which distinguishes our study from the existing ones . A Tucker decomposition is used to reduce the number of parameters in

  • Permuted and Unlinked Monotone Regression in R^d: an approach based on mixture modeling and optimal transport

    Updated: 2024-07-31 16:08:59
    : Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Permuted and Unlinked Monotone Regression in R^d : an approach based on mixture modeling and optimal transport Martin Slawski , Bodhisattva Sen 25(183 1 57, 2024. Abstract Suppose that we have a regression problem with response variable Y in mathbb{R d$ and predictor X in mathbb{R d$ , for d ge 1$ . In permuted or unlinked regression we have access to separate unordered data on X$ and Y$ , as opposed to data on X,Y pairs in usual regression . So far in the literature the case d=1$ has received attention , see e.g . the recent papers by Rigollet and Weed Information Inference , 8, 619-717 and

  • Transport-based Counterfactual Models

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Transport-based Counterfactual Models Lucas De Lara , Alberto González-Sanz , Nicholas Asher , Laurent Risser , Jean-Michel Loubes 25(136 1 59, 2024. Abstract Counterfactual frameworks have grown popular in machine learning for both explaining algorithmic decisions but also defining individual notions of fairness , more intuitive than typical group fairness conditions . However , state-of-the-art models to compute counterfactuals are either unrealistic or unfeasible . In particular , while Pearl's causal inference provides appealing rules to calculate counterfactuals , it relies on a model that is

  • On the Computational and Statistical Complexity of Over-parameterized Matrix Sensing

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us On the Computational and Statistical Complexity of Over-parameterized Matrix Sensing Jiacheng Zhuo , Jeongyeol Kwon , Nhat Ho , Constantine Caramanis 25(169 1 47, 2024. Abstract We consider solving the low-rank matrix sensing problem with the Factorized Gradient Descent FGD method when the specified rank is larger than the true rank . We refer to this as over-parameterized matrix sensing . If the ground truth signal mathbf{X in mathbb{R d times d is of rank r$ , but we try to recover it using mathbf{F mathbf{F top$ where mathbf{F in mathbb{R d times k and kr$ , the existing statistical analysis

  • Spectral learning of multivariate extremes

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Spectral learning of multivariate extremes Marco Avella Medina , Richard A Davis , Gennady Samorodnitsky 25(124 1 36, 2024. Abstract We propose a spectral clustering algorithm for analyzing the dependence structure of multivariate extremes . More specifically , we focus on the asymptotic dependence of multivariate extremes characterized by the angular or spectral measure in extreme value theory . Our work studies the theoretical performance of spectral clustering based on a random k$-nearest neighbor graph constructed from an extremal sample , i.e . the angular part of random vectors for which the

  • Fat-Shattering Dimension of k-fold Aggregations

    Updated: 2024-07-31 16:08:59
    We provide estimates on the fat-shattering dimension of aggregation rules of real-valued function classes. The latter consists of all ways of choosing k functions, one from each of the k classes, and computing pointwise an "aggregate" function of these, such as the median, mean, and maximum. The bounds are stated in terms of the fat-shattering dimensions of the component classes. For linear and affine function classes, we provide a considerably sharper upper bound and a matching lower bound, achieving, in particular, an optimal dependence on k. Along the way, we improve several known results in addition to pointing out and correcting a number of erroneous claims in the literature.

  • Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits Junpei Komiyama , Edouard Fouché , Junya Honda 25(112 1 56, 2024. Abstract We consider nonstationary multi-armed bandit problems where the model parameters of the arms change over time . We introduce the adaptive resetting bandit ADR-bandit a bandit algorithm class that leverages adaptive windowing techniques from literature on data streams . We first provide new guarantees on the quality of estimators resulting from adaptive windowing techniques , which are of independent interest . Furthermore , we conduct a finite-time analysis of

  • Sum-of-norms clustering does not separate nearby balls

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Sum-of-norms clustering does not separate nearby balls Alexander Dunlap , Jean-Christophe Mourrat 25(123 1 40, 2024. Abstract Sum-of-norms clustering is a popular convexification of K$-means clustering . We show that , if the dataset is made of a large number of independent random variables distributed according to the uniform measure on the union of two disjoint balls of unit radius , and if the balls are sufficiently close to one another , then sum-of-norms clustering will typically fail to recover the decomposition of the dataset into two clusters . As the dimension tends to infinity , this

  • Adaptive Latent Feature Sharing for Piecewise Linear Dimensionality Reduction

    Updated: 2024-07-31 16:08:59
    Home Page Papers Submissions News Editorial Board Special Issues Open Source Software Proceedings PMLR Data DMLR Transactions TMLR Search Statistics Login Frequently Asked Questions Contact Us Adaptive Latent Feature Sharing for Piecewise Linear Dimensionality Reduction Adam Farooq , Yordan P . Raykov , Petar Raykov , Max A . Little 25(135 1 42, 2024. Abstract Linear Gaussian exploratory tools such as principal component analysis PCA and factor analysis FA are widely used for exploratory analysis , pre-processing , data visualization , and related tasks . Because the linear-Gaussian assumption is restrictive , for very high dimensional problems , they have been replaced by robust , sparse extensions or more flexible discrete-continuous latent feature models . Discrete-continuous latent

  • American Height and Weight

    Updated: 2024-07-30 09:50:05
    Using body mass index (BMI), which is calculated with height and weight, most people fall into the categories of overweight or obese.Tags: BMI, height, obesity, weight

  • Stunning New Data Visualization Examples in Our Curated Collection — DataViz Weekly

    Updated: 2024-07-26 18:53:45
    Welcome to DataViz Weekly, where we present a curated selection of stunning data visualization examples our there. Whether you’re a data professional or simply interested in visual data, these charts and maps can provide both inspiration and practical ideas. Here’s what we have for you to explore today: U.S. immigration patterns — WaPo 2024 U.S. […] The post Stunning New Data Visualization Examples in Our Curated Collection — DataViz Weekly appeared first on AnyChart News.

  • Olympic data journalism

    Updated: 2024-07-26 11:34:43
    Speaking of the Olympics, Alberto Cairo and Simon Rogers talked about the warm-blooded…Tags: Alberto Cairo, data journalism, Olympics, Simon Rogers

  • ✚ Visualization Tools and Learning Resources, July 2024 Roundup

    Updated: 2024-07-25 18:30:06
    , Membership Projects Courses Tutorials Newsletter Become a Member Log in Members Only Visualization Tools and Learning Resources , July 2024 Roundup July 25, 2024 Topic The Process roundup Welcome to The Process the newsletter for FlowingData members that looks closer at how the charts get made . I’m Nathan Yau . Every month I collect tools and resources to help you make better charts . This is the good stuff for . July To access this issue of The Process , you must be a . member If you are already a member , log in here See What You Get The Process is a weekly newsletter on how visualization tools , rules , and guidelines work in practice . I publish every Thursday . Get it in your inbox or read it on FlowingData . You also gain unlimited access to hundreds of hours worth of step-by-step

  • New Data Visualization Projects Worth Checking Out — DataViz Weekly

    Updated: 2024-07-19 13:04:02
    Data is easier to explore and analyze when visualized. If you’re looking for practical examples, you’ve arrived at the right place. DataViz Weekly is here to introduce you to some new data visualization projects we have found on the web. NYC congestion zone crash tracker — Transpo Maps 2024 European Parliament election map — ZEIT […] The post New Data Visualization Projects Worth Checking Out — DataViz Weekly appeared first on AnyChart News.

  • Visualizing Forecast Accuracy, College Admissions, Global Demographics, and Election Results — DataViz Weekly

    Updated: 2024-07-12 09:21:54
    : , , , Sales : 1 888 845-1211 USA or 44 20 7193 9444 Europe customer login Toggle navigation Products AnyChart AnyStock AnyMap AnyGantt Mobile Qlik Extension Features Resources Business Solutions Technical Integrations Chartopedia Tutorials Support Company About Us Customers Success Stories More Testimonials News Download Buy Now Search News » Data Visualization Weekly » Visualizing Forecast Accuracy , College Admissions , Global Demographics , and Election Results — DataViz Weekly Visualizing Forecast Accuracy , College Admissions , Global Demographics , and Election Results — DataViz Weekly July 12th , 2024 by AnyChart Team Ready for a fresh dose of impressive data visualizations crafted by seasoned professionals Here†s what DataViz Weekly has in store for you this : time

  • Celebrating Success at Qlik Connect: Recap from AnyChart

    Updated: 2024-07-10 08:19:59
    Qlik Connect 2024 was nothing short of amazing, not just for its vibrant atmosphere but also for the palpable successes we experienced. Beyond the buzz, we showcased our latest advancements for Qlik Sense, gathered a wealth of insights, made meaningful contacts, and ran a hit interactive game that became the talk of the event. We […] The post Celebrating Success at Qlik Connect: Recap from AnyChart appeared first on AnyChart News.

  • 2024 UK Election Maps — DataViz Weekly

    Updated: 2024-07-08 09:17:08
    Last Thursday’s 2024 United Kingdom general election resulted in a historic shift within the nation’s political landscape, marking the Conservatives’ most severe defeat in nearly two centuries. As people look for clarity on these changes, election maps have come to the forefront as effective visual tools to make sense of voting outcomes and underlying patterns. In this […] The post 2024 UK Election Maps — DataViz Weekly appeared first on AnyChart News.

Current Feed Items | Previous Months Items

Jun 2024 | May 2024 | Apr 2024 | Mar 2024 | Feb 2024 | Jan 2024